Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version

Local Interpretation (LIME) (Operator Toolbox)

Synopsis

This operator is a meta operator to generate an approximation of the decision a given (complex) model made for specific examples. The key idea is to generate local feature weights (Interpretations) which can be easier interpreted and thus can help to understand the reasoning for a decision of a complex model on a per example basis.

Description

To do this we run the following algorithm:

  • 1. Draw uniformly distributed random data in [0,1] for all attributes
  • 2. Scale them to the same min/max like the input data
  • 3. Score them with the complex model
  • For each example in your input set do: 4. Calculate the euclidean distance d between the example and your random examples (normalized) 5. Use w = sqrt(exp(-(pow(d,2)/pow(this.kernel_width,2))) as a weight for the local model 6. Run the inner process, which creates a feature weight (e.g. via Weight by XXX, Linear Regression or Logistic Regression) 7. Add the top k attributes with their importance to the input set

As a result you get example set with new attributes describing the most important local attributes and a collection of attribute weights containing the full vector. If you calculate a performance vector in the inner process and connect it to the Per port, you can get the performance of the local description attached to every example.

The algorithm is very similar to LIME. Details on Lime can be found here:
  • https://homes.cs.washington.edu/~marcotcr/blog/lime/
  • https://arxiv.org/pdf/1602.04938v1.pdf
  • https://github.com/marcotcr/lime

Input

  • exa (Data Table)

    The ExampleSet you want to get interpretations for. Needs to have a reasonable size to estimate min and max.

  • mod (Model)

    The (complex) input model.

Output

  • exa (Data Table)

    ExampleSet with local interpretations.

  • mod (Model)

    The passed through input model.

  • wei

    A collection of Attribute Weights for each example.

  • loc

    The collection of local models.

Parameters

  • use_locality_heuristics If this parameter is set to true the locality heuristics derived from LIME (0.2*sqrt(#atts)) is used, otherwise the locality has to be set manually. Range:
  • locality A factor describing how local the model should be. The smaller this value is the more localized the model. It is used as kernel_width in step 5. Range:
  • sample_size Number of random examples drawn to built the local models on. Range:
  • number_of_attributes Number of attributes put into the ExampleSet in step 7. All attribute weights are delivered via the 'wei' port. Range:
  • weight_threshold A threshold to remove all examples with weights smaller than this value in each iteration. This removes irrelevant (non-local) random examples from the learning and can significantly speed up the operator. Range:
  • use_local_random_seed This parameter indicates if a local random seed should be used. Range:
  • local_random_seed If the use local random seed parameter is checked this parameter determines the local random seed. Range:

Tutorial Processes

Use Linear Regression to Interpret Deep Learing

Deep Learning Model on Iris interpreted by local Linear Regressions.

Use Weight by Gini Index to Interpret a GBT

Get Local Interpretations by using Weight By Gini Index for a GBT model trained on Iris.

Use a optimized Decision Tree to explain a GBT

In this tutorial we try to interpret GBT results on the iris data set. To do this we optimize the depth of a decision tree using a nested Optimize Parameters operator.